World Knowledge in the Time of Large Models
Summary from SILO website
Bio: Kenneth Marino is Research Scientist at Google DeepMind in NYC, focusing on improving knowledge-based systems such as retrieval and information extraction as well as embodied reasoning with language. He graduated in 2021 from Carnegie Mellon University advised by Abhinav Gupta, where his thesis focused on incorporating knowledge into embodied systems. He has an adjunct appointment at Columbia University where he teaches a class focused on the impact of datasets on machine learning and how to collect good datasets. He received his undergraduate degree from the Georgia Institute of Technology where he studied Computer Engineering and Computer Science.
Abstract: This talk will discuss the massive shift that has come about in the vision and ML community as a result of the large pre-trained language and language and vision models such as Flamingo, GPT-4, and other models. We begin by looking at the work on knowledge-based systems in CV and robotics before the large model revolution and discuss the impact it had. This impact can be broken down into three areas in which world knowledge should be studied in the context of these new models: evaluation, harnessing large models, and building outside knowledge. First, evaluating world knowledge is even more important as the large model revolution gives more easy access to world knowledge. Next, we discuss recent work in harnessing models such as Flamingo and Chinchilla for visual and procedural knowledge. Finally, the talk discusses how, by focusing on knowledge acquisition as an agent-centric problem, we can make developments in retrieving and collecting world knowledge.
Links
- About the Speaker → kennethmarino.com
- OK-VQA paper and dataset → okvqa.allenai.org/index.html
- KRISP paper → arxiv.org/abs/2012.11014
- Same Object, Different Grasps paper → arxiv.org/abs/2011.06431
- A-OKVQA dataset/GitHub → github.com/allenai/aokvqa
- Distilling Internet-Scale Vision-Language Models into Embodied Agents paper → arxiv.org/abs/2301.12507
Jump to section
- [0:00] Introducing Kenneth Marino
- [1:11] Begin presentation
- [1:37] What do we want from AI?
- [3:00] The old way: treating all tasks individually
- [3:50] Knowledge / priors matter
- [5:00] LMs have built-in knowledge
- [7:14] Prologue: Before the LLM/VLM revolution
- [8:15] Evaluating knowledge
- [9:30] Evaluating knowledge with OK-VQA
- [10:30] KRISP: Incorporating knowledge graphs
- [11:15] LLMs and VLMs: Accessible world knowledge
- [13:28] Evaluating knowledge in LLMs/VLMs
- [14:40] Many kinds of knowledge
- [16:17] Evaluating knowledge with A-OKVQA
- [22:38] From evaluating to using LLMs/VLMs
- [25:31] Extracting knoweldeg from LLMs
- [26:15] Bringing Flamingo’s knowledge into agents
- [35:48] The only constant is change
- [36:45] Inquisitive agents
- [40:11] Wikipedia navigation as a benchmark
- [53:00] Takeaways
- [53:55] Q&A
